Goto

Collaborating Authors

 homogeneous function




Extended convexity and smoothness and their applications in deep learning

Qi, Binchuan

arXiv.org Artificial Intelligence

The underlying mechanism by which simple gradient-based iterative algorithms can effectively handle the non-convex problem of deep model training remains incompletely understood within the traditional convex and non-convex analysis frameworks, which often require the Lipschitz smoothness of the gradient and strong convexity. In this paper, we introduce $\mathcal{H}(\phi)$-convexity and $\mathcal{H}(\Phi)$-smoothness, which broaden the existing concepts of smoothness and convexity, and delineate their fundamental properties. Building on these concepts, we introduce the high-order gradient descent and high-order stochastic gradient descent methods, which serve as extensions to the traditional gradient descent and stochastic gradient descent methods, respectively. Furthermore, we establish descent lemmas for the $\mathcal{H}(\phi)$-convex and $\mathcal{H}(\Phi)$-smooth objective functions when utilizing these four methods. On the basis of these findings, we develop the gradient structure control algorithm to address non-convex optimization objectives, encompassing both the functions represented by machine learning models and common loss functions in deep learning. The effectiveness of the proposed methodology is empirically validated through experiments.


Homogeneous Artificial Neural Network

Polyakov, Andrey

arXiv.org Artificial Intelligence

The universal approximation theorems [4], [10], [9] put limits on what artificial neural networks (ANNs) can theoretically learn. These theorems guarantee the existence of ANN, which approximates a continuous function on a compact set with an arbitrary high precision. A training of the ANN is based a compactly supported data as well, while, the trained ANN may be utilized next as a predictor of the function value for an input data, which does not belong to the training set. Sometimes, the new input may be have a rather large distance even from a convex hull of the training set. In the latter case, the ANN is utilized as an extrapolator of the function and the approximation theorems are not applicable. Moreover, the analysis of the extrapolation error is impossible if there is no information about the function away from the training set. Therefore, a global extrapolation of a function based on a local data can be provided only under additional assumption about the class of functions approximated by ANN. This paper deals with approximation of the so-called generalized homogeneous functions [24], [12], [2], [21] and introduces the corresponding homogeneous artificial neural network, which key feature is a global approximation based on local data. The generalized homogeneity is a symmetry of an object (a function, a set, a vector field, etc) with respect to a group of the so-called generalized dilations.


The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks

Kunin, Daniel, Yamamura, Atsushi, Ma, Chao, Ganguli, Surya

arXiv.org Artificial Intelligence

In this work, we explore the maximum-margin bias of quasi-homogeneous neural networks trained with gradient flow on an exponential loss and past a point of separability. We introduce the class of quasi-homogeneous models, which is expressive enough to describe nearly all neural networks with homogeneous activations, even those with biases, residual connections, and normalization layers, while structured enough to enable geometric analysis of its gradient dynamics. Using this analysis, we generalize the existing results of maximum-margin bias for homogeneous networks to this richer class of models. We find that gradient flow implicitly favors a subset of the parameters, unlike in the case of a homogeneous model where all parameters are treated equally. We demonstrate through simple examples how this strong favoritism toward minimizing an asymmetric norm can degrade the robustness of quasi-homogeneous models. On the other hand, we conjecture that this norm-minimization discards, when possible, unnecessary higher-order parameters, reducing the model to a sparser parameterization. Lastly, by applying our theorem to sufficiently expressive neural networks with normalization layers, we reveal a universal mechanism behind the empirical phenomenon of Neural Collapse.